Zbiór danych

Zbiór danych pobrany został z serwisu Kaggle. Dane te pochodzą od:

Zieba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction. Expert Systems with Applications.

Bankrutujące firmy pochodzą z lat: 2000 - 2012 A działające firmy z lat 2007 - 2013

Zbiór danych składa się z 5 plików, w każdym z nich znajdują się wartości 64 wskaźników finansowych oraz zmienna binarna class informująca, czy odpowiednio dla każdego pliku po 5, 4, 3, 2 latach oraz roku dane przedsiębiorstwo ogłosiło bankructwo. Dla każdego z roku występuje od 5000 do 10 000 przedsiębiorstw.

### biblioteki
library(readr)
library(psych)
library(naniar)
library(corrplot)
library(e1071)
library(polycor)

X1year <- read_csv("../dane/1year.csv", col_types = cols(Attr1 = col_double(), Attr2 = col_double(),
                                                      Attr3 = col_double(), Attr4 = col_double(),
                                                      Attr5 = col_double(), Attr6 = col_double(),
                                                      Attr7 = col_double(), Attr8 = col_double(),
                                                      Attr9 = col_double(), Attr10 = col_double(),
                                                      Attr11 = col_double(), Attr12 = col_double(),
                                                      Attr13 = col_double(), Attr14 = col_double(),
                                                      Attr15 = col_double(), Attr16 = col_double(),
                                                      Attr17 = col_double(), Attr18 = col_double(),
                                                      Attr19 = col_double(), Attr20 = col_double(),
                                                      Attr21 = col_double(), Attr22 = col_double(),
                                                      Attr23 = col_double(), Attr24 = col_double(),
                                                      Attr25 = col_double(), Attr26 = col_double(), 
                                                      Attr27 = col_double(), Attr28 = col_double(),
                                                      Attr29 = col_double(), Attr30 = col_double(),
                                                      Attr31 = col_double(), Attr32 = col_double(), 
                                                      Attr33 = col_double(), Attr34 = col_double(),
                                                      Attr35 = col_double(), Attr36 = col_double(),
                                                      Attr37 = col_double(), Attr38 = col_double(),
                                                      Attr39 = col_double(), Attr40 = col_double(), 
                                                      Attr41 = col_double(), Attr42 = col_double(),
                                                      Attr43 = col_double(), Attr44 = col_double(),
                                                      Attr45 = col_double(), Attr46 = col_double(), 
                                                      Attr47 = col_double(), Attr48 = col_double(),
                                                      Attr49 = col_double(), Attr50 = col_double(),
                                                      Attr51 = col_double(), Attr52 = col_double(),
                                                      Attr53 = col_double(), Attr54 = col_double(),
                                                      Attr55 = col_double(), Attr56 = col_double(),
                                                      Attr57 = col_double(), Attr58 = col_double(),
                                                      Attr59 = col_double(), Attr60 = col_double(),
                                                      Attr61 = col_double(), Attr62 = col_double(),
                                                      Attr63 = col_double(), Attr64 = col_double(), 
                                                      class= col_integer()))

Analiza zbioru danych na podstawie pliku 1year

Jak widać na poniższej wizualizacji braków danych jest stosunkowo niewiele - 1,3%, jednak występują dość licznie dla poszczególnych zmiennych i wierszy, na wydruku z konsoli zobaczyć można statystyki oraz ilośći braków danych dla poszczególnych zmiennych, usunięte zostaną zmienne, w których jest ponad 70 braków danych oraz wiersze, gdzie wystpuje ponad 5 braków danych.

##      Attr1               Attr2              Attr3               Attr4          
##  Min.   :-61.60200   Min.   :  0.0000   Min.   :-440.5000   Min.   :  0.00226  
##  1st Qu.:  0.01275   1st Qu.:  0.3195   1st Qu.:   0.0088   1st Qu.:  1.01990  
##  Median :  0.06347   Median :  0.5154   Median :   0.1672   Median :  1.42205  
##  Mean   : -0.01042   Mean   :  1.0034   Mean   :  -0.3008   Mean   :  2.61909  
##  3rd Qu.:  0.14446   3rd Qu.:  0.7250   3rd Qu.:   0.3593   3rd Qu.:  2.27018  
##  Max.   :  1.62030   Max.   :441.5000   Max.   :   0.9962   Max.   :261.50000  
##                                                             NA's   :3          
##      Attr5                Attr6               Attr7          
##  Min.   :-2722100.0   Min.   :-397.8900   Min.   :-61.60200  
##  1st Qu.:     -51.8   1st Qu.:   0.0000   1st Qu.:  0.01775  
##  Median :     -13.3   Median :   0.0000   Median :  0.07787  
##  Mean   :   -3043.1   Mean   :  -0.3866   Mean   :  0.05406  
##  3rd Qu.:      28.1   3rd Qu.:   0.1014   3rd Qu.:  0.17662  
##  Max.   :   82440.0   Max.   :   1.6774   Max.   :  9.52930  
##  NA's   :1                                                   
##      Attr8              Attr9               Attr10              Attr11        
##  Min.   : -2.0032   Min.   :  0.00142   Min.   :-440.5500   Min.   :-0.58636  
##  1st Qu.:  0.3508   1st Qu.:  1.03210   1st Qu.:   0.2557   1st Qu.: 0.02948  
##  Median :  0.8670   Median :  1.17730   Median :   0.4544   Median : 0.09463  
##  Mean   :  2.3395   Mean   :  2.06796   Mean   :   0.0328   Mean   : 0.13718  
##  3rd Qu.:  2.0095   3rd Qu.:  2.18710   3rd Qu.:   0.6542   3rd Qu.: 0.19605  
##  Max.   :260.5000   Max.   :194.18000   Max.   :  58.7250   Max.   : 9.54730  
##  NA's   :2                                                  NA's   :35        
##      Attr12             Attr13              Attr14              Attr15       
##  Min.   :-5.19700   Min.   :-607.4200   Min.   :-61.60200   Min.   :-307910  
##  1st Qu.: 0.03532   1st Qu.:   0.0293   1st Qu.:  0.01775   1st Qu.:    321  
##  Median : 0.19505   Median :   0.0667   Median :  0.07787   Median :    933  
##  Mean   : 0.65010   Mean   :  -0.6402   Mean   :  0.05406   Mean   :   6660  
##  3rd Qu.: 0.56471   3rd Qu.:   0.1270   3rd Qu.:  0.17662   3rd Qu.:   2408  
##  Max.   :30.65900   Max.   :   4.9366   Max.   :  9.52930   Max.   :3599100  
##  NA's   :3                                                  NA's   :1        
##      Attr16             Attr17              Attr18              Attr19         
##  Min.   :-1.51870   Min.   :  0.00226   Min.   :-61.60200   Min.   :-622.0600  
##  1st Qu.: 0.09293   1st Qu.:  1.37820   1st Qu.:  0.01775   1st Qu.:   0.0089  
##  Median : 0.25305   Median :  1.93650   Median :  0.07787   Median :   0.0410  
##  Mean   : 0.74721   Mean   :  3.45308   Mean   :  0.05406   Mean   :  -0.6932  
##  3rd Qu.: 0.62628   3rd Qu.:  3.12185   3rd Qu.:  0.17662   3rd Qu.:   0.0932  
##  Max.   :31.58700   Max.   :261.50000   Max.   :  9.52930   Max.   :   4.6252  
##  NA's   :2          NA's   :2                                                  
##      Attr20             Attr21              Attr22             Attr23         
##  Min.   :    0.00   Min.   :   0.2368   Min.   :-0.49360   Min.   :-634.5900  
##  1st Qu.:   16.45   1st Qu.:   1.0167   1st Qu.: 0.02046   1st Qu.:   0.0063  
##  Median :   37.56   Median :   1.1324   Median : 0.08111   Median :   0.0331  
##  Mean   :   79.53   Mean   :   2.7854   Mean   : 0.12261   Mean   :  -0.7168  
##  3rd Qu.:   62.58   3rd Qu.:   1.2789   3rd Qu.: 0.18003   3rd Qu.:   0.0773  
##  Max.   :25271.00   Max.   :1088.3000   Max.   : 6.61680   Max.   :   4.6252  
##                     NA's   :274                                               
##      Attr24              Attr25              Attr26             Attr27        
##  Min.   :-61.60200   Min.   :-459.5600   Min.   :-1.51870   Min.   :-14790.0  
##  1st Qu.:  0.02725   1st Qu.:   0.1681   1st Qu.: 0.08299   1st Qu.:     0.2  
##  Median :  0.13128   Median :   0.3522   Median : 0.22173   Median :     1.4  
##  Mean   :  0.11383   Mean   :  -0.0943   Mean   : 0.67498   Mean   :  1502.1  
##  3rd Qu.:  0.29669   3rd Qu.:   0.5716   3rd Qu.: 0.55951   3rd Qu.:     7.5  
##  Max.   :  2.53290   Max.   :  52.3290   Max.   :29.49900   Max.   :963640.0  
##  NA's   :13                              NA's   :2          NA's   :128       
##      Attr28             Attr29           Attr30             Attr31         
##  Min.   :-83.3030   Min.   :0.9764   Min.   :  -5.209   Min.   :-622.0600  
##  1st Qu.:  0.0186   1st Qu.:3.6936   1st Qu.:   0.107   1st Qu.:   0.0136  
##  Median :  0.4231   Median :4.0940   Median :   0.218   Median :   0.0456  
##  Mean   :  2.7275   Mean   :4.1442   Mean   :  10.515   Mean   :  -0.6833  
##  3rd Qu.:  1.2447   3rd Qu.:4.5536   3rd Qu.:   0.390   3rd Qu.:   0.0999  
##  Max.   :884.8500   Max.   :6.4404   Max.   :9238.900   Max.   :   4.6252  
##  NA's   :7                                                                 
##      Attr32             Attr33            Attr34             Attr35        
##  Min.   :     0.0   Min.   :  0.000   Min.   : -1.4559   Min.   :-0.55967  
##  1st Qu.:    50.9   1st Qu.:  2.974   1st Qu.:  0.2028   1st Qu.: 0.01830  
##  Median :    80.8   Median :  4.567   Median :  1.7861   Median : 0.07792  
##  Mean   :   562.0   Mean   :  7.461   Mean   :  4.2424   Mean   : 0.12217  
##  3rd Qu.:   125.0   3rd Qu.:  7.170   3rd Qu.:  4.1040   3rd Qu.: 0.18398  
##  Max.   :351630.0   Max.   :537.950   Max.   :537.9500   Max.   : 7.13970  
##  NA's   :2          NA's   :3         NA's   :2                            
##      Attr36              Attr37             Attr38              Attr39         
##  Min.   :  0.00394   Min.   : -41.728   Min.   :-440.5500   Min.   :-701.6300  
##  1st Qu.:  1.29820   1st Qu.:   1.380   1st Qu.:   0.3670   1st Qu.:   0.0098  
##  Median :  1.86510   Median :   3.451   Median :   0.5535   Median :   0.0415  
##  Mean   :  2.45535   Mean   :  51.639   Mean   :   0.1191   Mean   :  -0.7118  
##  3rd Qu.:  2.67450   3rd Qu.:   9.976   3rd Qu.:   0.7180   3rd Qu.:   0.0913  
##  Max.   :194.18000   Max.   :4770.000   Max.   :  60.1140   Max.   :   4.9681  
##                      NA's   :379                                               
##      Attr40              Attr41             Attr42              Attr43        
##  Min.   : -0.16910   Min.   :-10.9690   Min.   :-701.6300   Min.   :     4.4  
##  1st Qu.:  0.04444   1st Qu.:  0.0343   1st Qu.:   0.0112   1st Qu.:    65.3  
##  Median :  0.13541   Median :  0.0907   Median :   0.0431   Median :    96.5  
##  Mean   :  0.82839   Mean   :  0.8155   Mean   :  -0.7110   Mean   :  1112.4  
##  3rd Qu.:  0.42594   3rd Qu.:  0.2167   3rd Qu.:   0.0926   3rd Qu.:   131.8  
##  Max.   :176.45000   Max.   :328.6900   Max.   :   4.9681   Max.   :919500.0  
##  NA's   :3           NA's   :6                                                
##      Attr44             Attr45             Attr46              Attr47       
##  Min.   :     0.0   Min.   :-711.430   Min.   :  0.00059   Min.   :   0.00  
##  1st Qu.:    32.0   1st Qu.:   0.065   1st Qu.:  0.52482   1st Qu.:  17.34  
##  Median :    51.9   Median :   0.321   Median :  0.85587   Median :  40.22  
##  Mean   :  1032.8   Mean   :   5.280   Mean   :  1.79316   Mean   :  57.41  
##  3rd Qu.:    76.9   3rd Qu.:   0.938   3rd Qu.:  1.52440   3rd Qu.:  68.79  
##  Max.   :894230.0   Max.   :3730.000   Max.   :176.45000   Max.   :4510.20  
##                     NA's   :25         NA's   :4           NA's   :1        
##      Attr48             Attr49              Attr50              Attr51        
##  Min.   :-3.71400   Min.   :-716.2600   Min.   :  0.00226   Min.   :  0.0000  
##  1st Qu.:-0.02022   1st Qu.:  -0.0124   1st Qu.:  0.77364   1st Qu.:  0.2416  
##  Median : 0.04053   Median :   0.0221   Median :  1.13460   Median :  0.3956  
##  Mean   : 0.06446   Mean   :  -0.7641   Mean   :  2.15812   Mean   :  0.8968  
##  3rd Qu.: 0.13772   3rd Qu.:   0.0692   3rd Qu.:  1.80530   3rd Qu.:  0.5840  
##  Max.   : 6.29060   Max.   :   4.6567   Max.   :261.50000   Max.   :441.5000  
##                                         NA's   :2                             
##      Attr52             Attr53             Attr54             Attr55         
##  Min.   :  0.0000   Min.   : -98.122   Min.   : -82.303   Min.   :-189030.0  
##  1st Qu.:  0.1391   1st Qu.:   0.666   1st Qu.:   0.922   1st Qu.:      5.7  
##  Median :  0.2167   Median :   1.131   Median :   1.328   Median :   1280.9  
##  Mean   :  0.9877   Mean   :   7.532   Mean   :   7.950   Mean   :   5478.4  
##  3rd Qu.:  0.3361   3rd Qu.:   1.951   3rd Qu.:   2.113   3rd Qu.:   4914.4  
##  Max.   :453.9600   Max.   :4247.000   Max.   :4347.500   Max.   : 537580.0  
##  NA's   :1          NA's   :7          NA's   :7                             
##      Attr56              Attr57               Attr58        
##  Min.   :-701.6300   Min.   :-315.37000   Min.   :  0.0000  
##  1st Qu.:   0.0169   1st Qu.:   0.03944   1st Qu.:  0.8737  
##  Median :   0.0553   Median :   0.16610   Median :  0.9451  
##  Mean   :  -0.6815   Mean   :  -0.12735   Mean   :  1.6976  
##  3rd Qu.:   0.1299   3rd Qu.:   0.34360   3rd Qu.:  0.9835  
##  Max.   :   1.0000   Max.   :   7.26740   Max.   :702.6300  
##                                                             
##      Attr59              Attr60              Attr61             Attr62       
##  Min.   :-42.71700   Min.   :    0.014   Min.   :  0.0004   Min.   :      0  
##  1st Qu.:  0.00000   1st Qu.:    5.732   1st Qu.:  4.7431   1st Qu.:     46  
##  Median :  0.02488   Median :    9.462   Median :  7.0370   Median :     72  
##  Mean   :  0.22589   Mean   :   60.582   Mean   : 11.9421   Mean   :   7926  
##  3rd Qu.:  0.25205   3rd Qu.:   20.489   3rd Qu.: 11.3643   3rd Qu.:    116  
##  Max.   : 31.47200   Max.   :19157.000   Max.   :749.0000   Max.   :7276000  
##                      NA's   :25          NA's   :1                           
##      Attr63            Attr64              class       
##  Min.   :  0.000   Min.   :    0.000   Min.   :0.0000  
##  1st Qu.:  3.145   1st Qu.:    2.688   1st Qu.:0.0000  
##  Median :  5.068   Median :    4.778   Median :0.0000  
##  Mean   :  8.256   Mean   :   36.398   Mean   :0.2675  
##  3rd Qu.:  7.880   3rd Qu.:   10.119   3rd Qu.:1.0000  
##  Max.   :545.950   Max.   :14043.000   Max.   :1.0000  
##  NA's   :3         NA's   :7
#funkcja do usuwania, gdy NA w wierszu
usunNAWiersz  <- function(data)
{
  data$sumaNA  <- 0
  for(i in 1:nrow(data))
  {
    data$sumaNA[i]  <- sum(is.na(data[i,]))
  }
  
  data  <- data[which(data$sumaNA<5),]
  return(data[,-(ncol(data))])
}


#funkcja do usuwania, gdy NA w kolumnie
usunNAKol  <- function(data)
{
  data[nrow(data)+1,]  <- 0
  for(i in 1:ncol(data))
  {
    data[nrow(data),i]  <- sum(is.na(data[,i]))
  }
  
  data  <- data[,which(data[nrow(data),]<70)]
  return(data[-nrow(data),])
}

X1year  <- usunNAWiersz(X1year)
X1year  <- usunNAKol(X1year)

Po usunięciu

Po usunięciu kolumn i wierszy z dużą liczbą braków danych, braków danych jest około 0,1% Obserwacji jest 6929,a zmiennych 58.

Braki te mogłyby być jeszcze uzupełnione średnią z grupy, do której należy. Patrząc na statystyki wśród zmiennych, zauważyć można, że dla niektórych atrybutów występują obserwacje znacznie odstające.

vis_miss(X1year)

summary(X1year)
##      Attr1              Attr2             Attr3               Attr4        
##  Min.   :-1.37270   Min.   :0.01149   Min.   :-2.364000   Min.   : 0.1537  
##  1st Qu.: 0.01336   1st Qu.:0.32045   1st Qu.: 0.008579   1st Qu.: 1.0184  
##  Median : 0.06429   Median :0.51667   Median : 0.165170   Median : 1.4210  
##  Mean   : 0.09045   Mean   :0.54478   Mean   : 0.155267   Mean   : 2.3429  
##  3rd Qu.: 0.14553   3rd Qu.:0.72481   3rd Qu.: 0.355935   3rd Qu.: 2.2654  
##  Max.   : 1.62030   Max.   :3.33570   Max.   : 0.934540   Max.   :88.9700  
##                                                                            
##      Attr5                Attr6              Attr7              Attr8        
##  Min.   :-102660.00   Min.   :-3.48480   Min.   :-1.37270   Min.   :-2.0032  
##  1st Qu.:    -51.85   1st Qu.: 0.00000   1st Qu.: 0.01837   1st Qu.: 0.3530  
##  Median :    -13.66   Median : 0.00000   Median : 0.07809   Median : 0.8665  
##  Mean   :    -34.96   Mean   : 0.02897   Mean   : 0.10851   Mean   : 2.0574  
##  3rd Qu.:     27.72   3rd Qu.: 0.10174   3rd Qu.: 0.17717   3rd Qu.: 2.0061  
##  Max.   :  82440.00   Max.   : 1.67740   Max.   : 1.62030   Max.   :86.0380  
##                                                                              
##      Attr9              Attr10            Attr11             Attr12        
##  Min.   : 0.00142   Min.   :-2.3357   Min.   :-0.58636   Min.   :-5.19700  
##  1st Qu.: 1.03208   1st Qu.: 0.2558   1st Qu.: 0.03000   1st Qu.: 0.03682  
##  Median : 1.17705   Median : 0.4539   Median : 0.09463   Median : 0.19659  
##  Mean   : 1.87343   Mean   : 0.4297   Mean   : 0.12645   Mean   : 0.63495  
##  3rd Qu.: 2.18750   3rd Qu.: 0.6533   3rd Qu.: 0.19420   3rd Qu.: 0.55784  
##  Max.   :48.00500   Max.   : 0.9884   Max.   : 1.62820   Max.   :30.65900  
##                                       NA's   :34                           
##      Attr13              Attr14             Attr15              Attr16        
##  Min.   :-607.4200   Min.   :-1.37270   Min.   :-307910.0   Min.   :-1.51870  
##  1st Qu.:   0.0300   1st Qu.: 0.01837   1st Qu.:    324.9   1st Qu.: 0.09353  
##  Median :   0.0677   Median : 0.07809   Median :    932.9   Median : 0.25348  
##  Mean   :  -0.5996   Mean   : 0.10851   Mean   :   2869.2   Mean   : 0.73412  
##  3rd Qu.:   0.1283   3rd Qu.: 0.17717   3rd Qu.:   2384.4   3rd Qu.: 0.62759  
##  Max.   :   4.9366   Max.   : 1.62030   Max.   : 681770.0   Max.   :31.58700  
##                                                                               
##      Attr17            Attr18             Attr19              Attr20        
##  Min.   : 0.2998   Min.   :-1.37270   Min.   :-622.0600   Min.   :    0.00  
##  1st Qu.: 1.3797   1st Qu.: 0.01837   1st Qu.:   0.0092   1st Qu.:   16.94  
##  Median : 1.9354   Median : 0.07809   Median :   0.0411   Median :   38.24  
##  Mean   : 3.1721   Mean   : 0.10851   Mean   :  -0.6534   Mean   :   80.44  
##  3rd Qu.: 3.1206   3rd Qu.: 0.17717   3rd Qu.:   0.0937   3rd Qu.:   63.07  
##  Max.   :87.0460   Max.   : 1.62030   Max.   :   4.6252   Max.   :25271.00  
##                                                                             
##      Attr22             Attr23              Attr24             Attr25       
##  Min.   :-0.49360   Min.   :-634.5900   Min.   :-1.80250   Min.   :-3.2055  
##  1st Qu.: 0.02035   1st Qu.:   0.0065   1st Qu.: 0.02839   1st Qu.: 0.1707  
##  Median : 0.08100   Median :   0.0333   Median : 0.13164   Median : 0.3522  
##  Mean   : 0.11529   Mean   :  -0.6769   Mean   : 0.17976   Mean   : 0.3300  
##  3rd Qu.: 0.17851   3rd Qu.:   0.0780   3rd Qu.: 0.29776   3rd Qu.: 0.5713  
##  Max.   : 1.62820   Max.   :   4.6252   Max.   : 2.53290   Max.   : 0.9773  
##                                         NA's   :11                          
##      Attr26             Attr28             Attr29          Attr30        
##  Min.   :-1.51870   Min.   :-83.3030   Min.   :1.964   Min.   :  -5.209  
##  1st Qu.: 0.08446   1st Qu.:  0.0181   1st Qu.:3.701   1st Qu.:   0.109  
##  Median : 0.22251   Median :  0.4200   Median :4.101   Median :   0.220  
##  Mean   : 0.66320   Mean   :  2.7204   Mean   :4.161   Mean   :  10.153  
##  3rd Qu.: 0.56202   3rd Qu.:  1.2401   3rd Qu.:4.558   3rd Qu.:   0.392  
##  Max.   :29.49900   Max.   :884.8500   Max.   :6.440   Max.   :9238.900  
##                     NA's   :1                                            
##      Attr31              Attr32              Attr33              Attr34        
##  Min.   :-622.0600   Min.   :    1.355   Min.   :  0.03525   Min.   : -1.4559  
##  1st Qu.:   0.0138   1st Qu.:   51.028   1st Qu.:  2.98022   1st Qu.:  0.2204  
##  Median :   0.0458   Median :   80.800   Median :  4.56740   Median :  1.8079  
##  Mean   :  -0.6433   Mean   :  139.720   Mean   :  6.93500   Mean   :  3.6989  
##  3rd Qu.:   0.1010   3rd Qu.:  125.000   3rd Qu.:  7.17873   3rd Qu.:  4.0953  
##  Max.   :   4.6252   Max.   :10355.000   Max.   :271.02000   Max.   :271.0200  
##                                                                                
##      Attr35             Attr36             Attr38            Attr39         
##  Min.   :-0.55967   Min.   : 0.00394   Min.   :-2.3357   Min.   :-701.6300  
##  1st Qu.: 0.01828   1st Qu.: 1.30000   1st Qu.: 0.3670   1st Qu.:   0.0098  
##  Median : 0.07792   Median : 1.86465   Median : 0.5518   Median :   0.0417  
##  Mean   : 0.11375   Mean   : 2.25710   Mean   : 0.5149   Mean   :  -0.7236  
##  3rd Qu.: 0.18313   3rd Qu.: 2.66945   3rd Qu.: 0.7165   3rd Qu.:   0.0913  
##  Max.   : 2.15460   Max.   :48.12100   Max.   : 0.9906   Max.   :   4.9681  
##                                                                             
##      Attr40             Attr41              Attr42              Attr43        
##  Min.   :-0.16910   Min.   :-10.96900   Min.   :-701.6300   Min.   :     5.7  
##  1st Qu.: 0.04439   1st Qu.:  0.03489   1st Qu.:   0.0111   1st Qu.:    65.6  
##  Median : 0.13426   Median :  0.09072   Median :   0.0432   Median :    96.5  
##  Mean   : 0.63352   Mean   :  0.42651   Mean   :  -0.7224   Mean   :  1109.2  
##  3rd Qu.: 0.40955   3rd Qu.:  0.21587   3rd Qu.:   0.0929   3rd Qu.:   131.5  
##  Max.   :80.85800   Max.   : 79.31700   Max.   :   4.9681   Max.   :919500.0  
##                     NA's   :5                                                 
##      Attr44             Attr45             Attr46             Attr47       
##  Min.   :     0.5   Min.   :-291.450   Min.   : 0.03139   Min.   :   0.00  
##  1st Qu.:    32.0   1st Qu.:   0.066   1st Qu.: 0.52405   1st Qu.:  17.92  
##  Median :    51.5   Median :   0.322   Median : 0.84802   Median :  41.17  
##  Mean   :  1028.8   Mean   :   6.003   Mean   : 1.60368   Mean   :  58.09  
##  3rd Qu.:    76.6   3rd Qu.:   0.938   3rd Qu.: 1.52045   3rd Qu.:  69.67  
##  Max.   :894230.0   Max.   :3730.000   Max.   :88.97000   Max.   :4510.20  
##                     NA's   :17         NA's   :1                           
##      Attr48             Attr49              Attr50             Attr51        
##  Min.   :-3.71400   Min.   :-716.2600   Min.   : 0.06638   Min.   :0.005791  
##  1st Qu.:-0.02082   1st Qu.:  -0.0126   1st Qu.: 0.77026   1st Qu.:0.242982  
##  Median : 0.03976   Median :   0.0217   Median : 1.12835   Median :0.396130  
##  Mean   : 0.05672   Mean   :  -0.7762   Mean   : 1.88121   Mean   :0.438056  
##  3rd Qu.: 0.13702   3rd Qu.:   0.0690   3rd Qu.: 1.78880   3rd Qu.:0.584252  
##  Max.   : 1.56490   Max.   :   4.6567   Max.   :57.66100   Max.   :3.335700  
##                                                                              
##      Attr52              Attr53             Attr54             Attr55         
##  Min.   : 0.003713   Min.   :-98.1220   Min.   :-82.3030   Min.   :-189030.0  
##  1st Qu.: 0.139298   1st Qu.:  0.6643   1st Qu.:  0.9203   1st Qu.:      4.4  
##  Median : 0.218945   Median :  1.1256   Median :  1.3219   Median :   1297.0  
##  Mean   : 0.364982   Mean   :  3.0711   Mean   :  3.3683   Mean   :   5496.6  
##  3rd Qu.: 0.336088   3rd Qu.:  1.9454   3rd Qu.:  2.0981   3rd Qu.:   4955.7  
##  Max.   :28.371000   Max.   :838.4500   Max.   :838.4500   Max.   : 537580.0  
##                      NA's   :1          NA's   :1                             
##      Attr56              Attr57               Attr58        
##  Min.   :-701.6300   Min.   :-315.37000   Min.   :  0.0042  
##  1st Qu.:   0.0168   1st Qu.:   0.04114   1st Qu.:  0.8746  
##  Median :   0.0553   Median :   0.17105   Median :  0.9456  
##  Mean   :  -0.6950   Mean   :  -0.13023   Mean   :  1.7109  
##  3rd Qu.:   0.1299   3rd Qu.:   0.34414   3rd Qu.:  0.9837  
##  Max.   :   0.9979   Max.   :   7.26740   Max.   :702.6300  
##                                                             
##      Attr59              Attr60              Attr61             Attr62       
##  Min.   :-42.71700   Min.   :    0.014   Min.   :  0.0004   Min.   :      1  
##  1st Qu.:  0.00000   1st Qu.:    5.732   1st Qu.:  4.7656   1st Qu.:     46  
##  Median :  0.02683   Median :    9.435   Median :  7.0819   Median :     72  
##  Mean   :  0.21861   Mean   :   59.541   Mean   : 11.9589   Mean   :   7848  
##  3rd Qu.:  0.25268   3rd Qu.:   20.351   3rd Qu.: 11.4095   3rd Qu.:    116  
##  Max.   : 31.47200   Max.   :19157.000   Max.   :749.0000   Max.   :7276000  
##                      NA's   :17                                              
##      Attr63              Attr64             class       
##  Min.   :  0.00005   Min.   :   0.000   Min.   :0.0000  
##  1st Qu.:  3.14780   1st Qu.:   2.688   1st Qu.:0.0000  
##  Median :  5.05845   Median :   4.794   Median :0.0000  
##  Mean   :  7.71412   Mean   :  21.683   Mean   :0.2627  
##  3rd Qu.:  7.88507   3rd Qu.:   9.971   3rd Qu.:1.0000  
##  Max.   :277.72000   Max.   :3756.100   Max.   :1.0000  
##                      NA's   :1

Skalowanie danych

Ponieważ dane są w różnych rzędach wielkości, postanowiono je zeskalować.

Obserwacje odstające

W celu detekcji i usunięcia outlierów zastosowano procedurę opartą na kwantylach i współczynniku asymetrii, usunięcie outlierów taką metodą sprawia, że zwiększa się dokładność i wrażliwość prognozy (Problem of Outliers in Corporate BankruptcyPrediction, Barbara Pawełek, Józef Pociecha) .

outlieryKwantyl  <- function(x)
{
  x  <- as.data.frame(x)
  for(i in 1:(ncol(x)-1))
  {
    
    coefAsy  <- moment(as.double(x[,i]), order=3, center=TRUE, na.rm = TRUE)/((sd(x[,i], na.rm = TRUE))^3)
    
    if(is.numeric(coefAsy))
    {
      if(coefAsy>1)
    {
      outliersIndex  <- which((x[,i])>quantile(x[,i], .99, na.rm = TRUE))
      
      if(length(outliersIndex)>0)
      {
        x  <- x[-outliersIndex,] 
      }
    }
    
    if(coefAsy<(-1))
    {
      outliersIndex  <- which((x[,i])<quantile(x[,i], .01, na.rm = TRUE))
      
      if(length(outliersIndex)>0)
      {
        x  <- x[-outliersIndex,] 
      }
    }
    if(coefAsy>-1&&coefAsy<1)
    {
      outliersIndex  <- which((x[,i])>quantile(x[,i], .995, na.rm = TRUE))&&which((x[,i])<quantile(x[,i], .005, na.rm = TRUE))
      
      if(length(outliersIndex)>0)
      {
        x  <- x[-outliersIndex,] 
      }
    }
    }
    
  }
  x  <- as.data.frame(x)
  return(x)
}

outliery  <- function(x)
{
  for(i in 1:ncol(x)-1)
  {
    outliersIndex  <- which(abs(x[,i])-3>0)
    if(sum(outliersIndex)!=0)
    {
      x  <- x[-outliersIndex,]
    }
  }
  x  <- as.data.frame(x)
  return(x)
}

dane po usunięciu outlierów

##      Attr1              Attr2              Attr3              Attr4         
##  Min.   :-1.94132   Min.   :-1.41803   Min.   :-2.61159   Min.   :-0.44453  
##  1st Qu.:-0.39234   1st Qu.:-0.53871   1st Qu.:-0.43773   1st Qu.:-0.27889  
##  Median :-0.13871   Median :-0.05923   Median :-0.03635   Median :-0.21033  
##  Mean   :-0.04672   Mean   :-0.01491   Mean   :-0.02854   Mean   :-0.16080  
##  3rd Qu.: 0.21103   3rd Qu.: 0.50494   3rd Qu.: 0.40781   3rd Qu.:-0.08746  
##  Max.   : 2.18058   Max.   : 2.56399   Max.   : 1.56431   Max.   : 1.38973  
##                                                                             
##      Attr5               Attr6              Attr7              Attr8         
##  Min.   :-0.384328   Min.   :-2.02664   Min.   :-1.90055   Min.   :-0.48083  
##  1st Qu.:-0.002381   1st Qu.:-0.09434   1st Qu.:-0.42203   1st Qu.:-0.32675  
##  Median : 0.003556   Median :-0.09434   Median :-0.15106   Median :-0.23927  
##  Mean   : 0.008545   Mean   : 0.10333   Mean   :-0.04818   Mean   :-0.16284  
##  3rd Qu.: 0.011028   3rd Qu.: 0.27028   3rd Qu.: 0.22297   3rd Qu.:-0.08775  
##  Max.   : 1.706262   Max.   : 2.31764   Max.   : 2.68469   Max.   : 1.35845  
##                                                                              
##      Attr9             Attr10             Attr11             Attr12        
##  Min.   :-0.6358   Min.   :-2.84065   Min.   :-2.10545   Min.   :-1.22665  
##  1st Qu.:-0.3993   1st Qu.:-0.47208   1st Qu.:-0.48155   1st Qu.:-0.27620  
##  Median :-0.3373   Median : 0.03309   Median :-0.15539   Median :-0.20889  
##  Mean   :-0.0981   Mean   : 0.02651   Mean   :-0.07396   Mean   :-0.15492  
##  3rd Qu.: 0.1086   3rd Qu.: 0.53540   3rd Qu.: 0.22156   3rd Qu.:-0.08784  
##  Max.   : 1.9470   Max.   : 1.45895   Max.   : 2.74400   Max.   : 0.80904  
##                                       NA's   :22                           
##      Attr13            Attr14             Attr15              Attr16       
##  Min.   :0.01745   Min.   :-1.90055   Min.   :-1.304566   Min.   :-0.9343  
##  1st Qu.:0.03218   1st Qu.:-0.42203   1st Qu.:-0.072174   1st Qu.:-0.2726  
##  Median :0.03371   Median :-0.15106   Median :-0.051375   Median :-0.2145  
##  Mean   :0.03424   Mean   :-0.04818   Mean   :-0.034555   Mean   :-0.1655  
##  3rd Qu.:0.03587   3rd Qu.: 0.22297   3rd Qu.:-0.005883   3rd Qu.:-0.1079  
##  Max.   :0.04619   Max.   : 2.68469   Max.   : 1.079261   Max.   : 0.8336  
##                                                                            
##      Attr17             Attr18             Attr19            Attr20        
##  Min.   :-0.46458   Min.   :-1.90055   Min.   :0.01918   Min.   :-0.09541  
##  1st Qu.:-0.33446   1st Qu.:-0.42203   1st Qu.:0.03280   1st Qu.:-0.07124  
##  Median :-0.24313   Median :-0.15106   Median :0.03426   Median :-0.04772  
##  Mean   :-0.16860   Mean   :-0.04818   Mean   :0.03467   Mean   :-0.04282  
##  3rd Qu.:-0.09802   3rd Qu.: 0.22297   3rd Qu.:0.03628   3rd Qu.:-0.02406  
##  Max.   : 1.29463   Max.   : 2.68469   Max.   :0.04537   Max.   : 0.10568  
##                                                                            
##      Attr22             Attr23            Attr24             Attr25        
##  Min.   :-1.99071   Min.   :0.01993   Min.   :-1.86892   Min.   :-2.14890  
##  1st Qu.:-0.45131   1st Qu.:0.03315   1st Qu.:-0.46759   1st Qu.:-0.33501  
##  Median :-0.14285   Median :0.03435   Median :-0.17693   Median : 0.07291  
##  Mean   :-0.03658   Mean   :0.03470   Mean   :-0.05551   Mean   : 0.07807  
##  3rd Qu.: 0.22357   3rd Qu.:0.03614   3rd Qu.: 0.29246   3rd Qu.: 0.57717  
##  Max.   : 2.87650   Max.   :0.04433   Max.   : 1.94647   Max.   : 1.50168  
##                                       NA's   :5                            
##      Attr26            Attr28             Attr29             Attr30        
##  Min.   :-0.9757   Min.   :-0.21860   Min.   :-3.01679   Min.   :-0.03411  
##  1st Qu.:-0.2675   1st Qu.:-0.08779   1st Qu.:-0.58776   1st Qu.:-0.03327  
##  Median :-0.2141   Median :-0.07782   Median :-0.03647   Median :-0.03298  
##  Mean   :-0.1649   Mean   :-0.06479   Mean   : 0.06799   Mean   :-0.03287  
##  3rd Qu.:-0.1076   3rd Qu.:-0.05849   3rd Qu.: 0.65176   3rd Qu.:-0.03259  
##  Max.   : 0.6298   Max.   : 0.17546   Max.   : 2.94338   Max.   :-0.02967  
##                                                                            
##      Attr31             Attr32             Attr33             Attr34         
##  Min.   :0.005559   Min.   :-0.23000   Min.   :-0.44873   Min.   :-0.387637  
##  1st Qu.:0.032620   1st Qu.:-0.16454   1st Qu.:-0.29537   1st Qu.:-0.320889  
##  Median :0.033970   Median :-0.11727   Median :-0.18452   Median :-0.181973  
##  Mean   :0.034406   Mean   :-0.09803   Mean   :-0.14900   Mean   :-0.131548  
##  3rd Qu.:0.036163   3rd Qu.:-0.05156   3rd Qu.:-0.04043   3rd Qu.: 0.006137  
##  Max.   :0.047032   Max.   : 0.26482   Max.   : 0.66804   Max.   : 0.712239  
##                                                                              
##      Attr35             Attr36            Attr38             Attr39       
##  Min.   :-3.40453   Min.   :-0.8861   Min.   :-3.27540   Min.   :0.01248  
##  1st Qu.:-0.42202   1st Qu.:-0.3693   1st Qu.:-0.40113   1st Qu.:0.03242  
##  Median :-0.15510   Median :-0.1569   Median : 0.09265   Median :0.03363  
##  Mean   :-0.04483   Mean   :-0.0779   Mean   : 0.03360   Mean   :0.03394  
##  3rd Qu.: 0.23921   3rd Qu.: 0.1532   3rd Qu.: 0.57358   3rd Qu.:0.03537  
##  Max.   : 2.67623   Max.   : 1.7045   Max.   : 1.46414   Max.   :0.04348  
##                                                                           
##      Attr40            Attr41             Attr42            Attr43        
##  Min.   :-0.2509   Min.   :-1.81419   Min.   :0.02186   Min.   :-0.03651  
##  1st Qu.:-0.1846   1st Qu.:-0.08859   1st Qu.:0.03245   1st Qu.:-0.03475  
##  Median :-0.1655   Median :-0.07458   Median :0.03360   Median :-0.03386  
##  Mean   :-0.1206   Mean   :-0.07707   Mean   :0.03402   Mean   :-0.03374  
##  3rd Qu.:-0.1046   3rd Qu.:-0.05072   3rd Qu.:0.03537   3rd Qu.:-0.03281  
##  Max.   : 0.5751   Max.   : 0.73197   Max.   :0.04332   Max.   :-0.02824  
##                    NA's   :2                                              
##      Attr44             Attr45             Attr46             Attr47        
##  Min.   :-0.03514   Min.   :-0.09120   Min.   :-0.36102   Min.   :-0.36572  
##  1st Qu.:-0.03421   1st Qu.:-0.04770   1st Qu.:-0.25383   1st Qu.:-0.23035  
##  Median :-0.03362   Median :-0.04605   Median :-0.19546   Median :-0.09540  
##  Mean   :-0.03346   Mean   :-0.04330   Mean   :-0.15135   Mean   :-0.06752  
##  3rd Qu.:-0.03287   3rd Qu.:-0.04235   3rd Qu.:-0.08758   3rd Qu.: 0.03786  
##  Max.   :-0.03029   Max.   : 0.04878   Max.   : 0.42030   Max.   : 0.69534  
##                     NA's   :5                                               
##      Attr48             Attr49            Attr50             Attr51        
##  Min.   :-1.47020   Min.   :0.02031   Min.   :-0.46417   Min.   :-1.38483  
##  1st Qu.:-0.27235   1st Qu.:0.03297   1st Qu.:-0.29945   1st Qu.:-0.55146  
##  Median :-0.04591   Median :0.03427   Median :-0.21822   Median :-0.12202  
##  Mean   : 0.02096   Mean   :0.03439   Mean   :-0.16681   Mean   :-0.01548  
##  3rd Qu.: 0.25641   3rd Qu.:0.03590   3rd Qu.:-0.09493   3rd Qu.: 0.42766  
##  Max.   : 2.11979   Max.   :0.04366   Max.   : 0.71053   Max.   : 3.15102  
##                                                                            
##      Attr52             Attr53             Attr54             Attr55        
##  Min.   :-0.22889   Min.   :-0.22861   Min.   :-0.23875   Min.   :-2.01059  
##  1st Qu.:-0.16013   1st Qu.:-0.08222   1st Qu.:-0.08408   1st Qu.:-0.15531  
##  Median :-0.11411   Median :-0.06983   Median :-0.07355   Median :-0.11750  
##  Mean   :-0.09143   Mean   :-0.05781   Mean   :-0.06097   Mean   :-0.02872  
##  3rd Qu.:-0.04330   3rd Qu.:-0.05074   3rd Qu.:-0.05528   3rd Qu.:-0.00717  
##  Max.   : 0.28960   Max.   : 0.20826   Max.   : 0.19784   Max.   : 3.33942  
##                                                                             
##      Attr56            Attr57             Attr58             Attr59        
##  Min.   :0.01123   Min.   :-0.08802   Min.   :-0.04748   Min.   :-0.70102  
##  1st Qu.:0.03125   1st Qu.: 0.01737   1st Qu.:-0.03545   1st Qu.:-0.10159  
##  Median :0.03269   Median : 0.02926   Median :-0.03333   Median :-0.06659  
##  Mean   :0.03332   Mean   : 0.03240   Mean   :-0.03400   Mean   : 0.06408  
##  3rd Qu.:0.03504   3rd Qu.: 0.04346   3rd Qu.:-0.03196   3rd Qu.: 0.05526  
##  Max.   :0.04748   Max.   : 0.29174   Max.   :-0.02045   Max.   : 6.65298  
##                                                                            
##      Attr60             Attr61             Attr62             Attr63        
##  Min.   :-0.08840   Min.   :-0.30911   Min.   :-0.03305   Min.   :-0.44710  
##  1st Qu.:-0.08240   1st Qu.:-0.22433   1st Qu.:-0.03292   1st Qu.:-0.31179  
##  Median :-0.07782   Median :-0.15291   Median :-0.03283   Median :-0.20147  
##  Mean   :-0.06745   Mean   :-0.09070   Mean   :-0.03279   Mean   :-0.16570  
##  3rd Qu.:-0.06470   3rd Qu.:-0.02879   3rd Qu.:-0.03269   3rd Qu.:-0.05955  
##  Max.   : 0.15715   Max.   : 1.64506   Max.   :-0.03217   Max.   : 0.61869  
##  NA's   :5                                                                  
##      Attr64             class       
##  Min.   :-0.15650   Min.   :0.0000  
##  1st Qu.:-0.13821   1st Qu.:0.0000  
##  Median :-0.12364   Median :0.0000  
##  Mean   :-0.09880   Mean   :0.2619  
##  3rd Qu.:-0.09569   3rd Qu.:1.0000  
##  Max.   : 0.50834   Max.   :1.0000  
## 

współczynniki zmienności

Praktycznie każda zmienna cechuje się dużą zmiennością, wyjątkiem jest Attr13, Attr20, Attr30, Attr43, Attr44, Attr49, Attr56, Attr59, Attr62.

##   Attr1   Attr2   Attr3   Attr4   Attr5   Attr6   Attr7   Attr8   Attr9  Attr10 
## -11.951 -46.548 -23.287  -1.133  12.549   5.038 -12.479  -1.530  -4.542  25.352 
##  Attr11  Attr12  Attr13  Attr14  Attr15  Attr16  Attr17  Attr18  Attr19  Attr20 
##  -8.694  -1.270   0.100 -12.479  -4.643  -1.042  -1.482 -12.479   0.089  -0.821 
##  Attr22  Attr23  Attr24  Attr25  Attr26  Attr28  Attr29  Attr30  Attr31  Attr32 
## -17.658   0.078 -10.947   8.330  -0.989  -0.703  13.387  -0.019   0.100  -0.884 
##  Attr33  Attr34  Attr35  Attr36  Attr38  Attr39  Attr40  Attr41  Attr42  Attr43 
##  -1.318  -1.664 -14.236  -5.606  20.454   0.085  -0.925  -2.235   0.076  -0.040 
##  Attr44  Attr45  Attr46  Attr47  Attr48  Attr49  Attr50  Attr51  Attr52  Attr53 
##  -0.029  -0.243  -0.969  -3.011  23.488   0.082  -1.205 -46.059  -0.988  -0.807 
##  Attr54  Attr55  Attr56  Attr57  Attr58  Attr59  Attr60  Attr61  Attr62  Attr63 
##  -0.755 -14.316   0.108   0.914  -0.098   6.425  -0.434  -2.469  -0.005  -1.171 
##  Attr64   class 
##  -0.785   1.680

Zastąpienie braków średnią grupową

Ponieważ wiele funkcji jest czułych na braki danych, postanowiono zastąpić je średnią grupową. Zmienną class ustawiam jako factor.

Stosunek bankrutów do niebankrutów

Korelacja

W macierzy korelacji nie widać silnej korelacji między żadną ze zmiennych, a bankructwem. Zmienne posegregowane w kolejności malejącej korelacji.

Korelacja między zmiennymi

Widać jednak silną korelację między zmiennymi, właściwe mogłoby być zastosowanie głównych składowych w celu redukcji wymiarowości.

Analiza głównych składowych

Ponieważ praca z tak dużą ilością zmiennych może być trudna, przeszkadza to w wizualizacji i może zmniejszyć efektywność poprzez włączenie zmiennych, które nie mają wpływu na analizę, postanowiono przeprowadzić analizę głównych składowych

SklGlowna wartWlasna procWarianWyj skumpProcWarWyj
1 21.25 34.84 34.84
2 8.46 13.86 48.70
3 6.72 11.02 59.72
4 4.47 7.33 67.05
5 2.32 3.80 70.85
6 1.79 2.93 73.78
7 1.75 2.86 76.64
8 1.32 2.16 78.80
9 1.17 1.92 80.72
10 1.11 1.82 82.54
11 1.02 1.67 84.21
12 0.95 1.55 85.76
13 0.81 1.34 87.09
14 0.77 1.26 88.36
15 0.72 1.19 89.55
16 0.67 1.09 90.64
17 0.64 1.06 91.70
18 0.56 0.92 92.61
19 0.52 0.86 93.47
20 0.47 0.78 94.25
21 0.43 0.71 94.96
22 0.36 0.59 95.55
23 0.34 0.57 96.12
24 0.31 0.50 96.62
25 0.29 0.47 97.09
26 0.25 0.41 97.50
27 0.20 0.33 97.83
28 0.16 0.26 98.09
29 0.15 0.24 98.33
30 0.13 0.22 98.54
31 0.12 0.19 98.74
32 0.11 0.18 98.92
33 0.10 0.16 99.08
34 0.09 0.14 99.22
35 0.07 0.12 99.33
36 0.06 0.11 99.44
37 0.05 0.09 99.53
38 0.05 0.08 99.61
39 0.04 0.06 99.67
40 0.03 0.05 99.72
41 0.03 0.05 99.77
42 0.03 0.04 99.81
43 0.03 0.04 99.85
44 0.02 0.03 99.88
45 0.01 0.02 99.90
46 0.01 0.02 99.92
47 0.01 0.02 99.94
48 0.01 0.02 99.95
49 0.01 0.01 99.97
50 0.01 0.01 99.98
51 0.00 0.01 99.98
52 0.00 0.01 99.99
53 0.00 0.00 99.99
54 0.00 0.00 100.00
55 0.00 0.00 100.00
56 0.00 0.00 100.00
57 0.00 0.00 100.00
58 0.00 0.00 100.00
59 0.00 0.00 100.00
60 0.00 0.00 100.00
61 0.00 0.00 100.00
62 21.25 34.84 34.84
63 8.46 13.86 48.70
64 6.72 11.02 59.72
65 4.47 7.33 67.05
66 2.32 3.80 70.85
67 1.79 2.93 73.78
68 1.75 2.86 76.64
69 1.32 2.16 78.80
70 1.17 1.92 80.72
71 1.11 1.82 82.54
72 1.02 1.67 84.21
73 0.95 1.55 85.76
74 0.81 1.34 87.09
75 0.77 1.26 88.36
76 0.72 1.19 89.55
77 0.67 1.09 90.64
78 0.64 1.06 91.70
79 0.56 0.92 92.61
80 0.52 0.86 93.47
81 0.47 0.78 94.25
82 0.43 0.71 94.96
83 0.36 0.59 95.55
84 0.34 0.57 96.12
85 0.31 0.50 96.62
86 0.29 0.47 97.09
87 0.25 0.41 97.50
88 0.20 0.33 97.83
89 0.16 0.26 98.09
90 0.15 0.24 98.33
91 0.13 0.22 98.54
92 0.12 0.19 98.74
93 0.11 0.18 98.92
94 0.10 0.16 99.08
95 0.09 0.14 99.22
96 0.07 0.12 99.33
97 0.06 0.11 99.44
98 0.05 0.09 99.53
99 0.05 0.08 99.61
100 0.04 0.06 99.67
101 0.03 0.05 99.72
102 0.03 0.05 99.77
103 0.03 0.04 99.81
104 0.03 0.04 99.85
105 0.02 0.03 99.88
106 0.01 0.02 99.90
107 0.01 0.02 99.92
108 0.01 0.02 99.94
109 0.01 0.02 99.95
110 0.01 0.01 99.97
111 0.01 0.01 99.98
112 0.00 0.01 99.98
113 0.00 0.01 99.99
114 0.00 0.00 99.99
115 0.00 0.00 100.00
116 0.00 0.00 100.00
117 0.00 0.00 100.00
118 0.00 0.00 100.00
119 0.00 0.00 100.00
120 0.00 0.00 100.00
121 0.00 0.00 100.00
122 0.00 0.00 100.00
123 21.25 34.84 34.84
124 8.46 13.86 48.70
125 6.72 11.02 59.72
126 4.47 7.33 67.05
127 2.32 3.80 70.85
128 1.79 2.93 73.78
129 1.75 2.86 76.64
130 1.32 2.16 78.80
131 1.17 1.92 80.72
132 1.11 1.82 82.54
133 1.02 1.67 84.21
134 0.95 1.55 85.76
135 0.81 1.34 87.09
136 0.77 1.26 88.36
137 0.72 1.19 89.55
138 0.67 1.09 90.64
139 0.64 1.06 91.70
140 0.56 0.92 92.61
141 0.52 0.86 93.47
142 0.47 0.78 94.25
143 0.43 0.71 94.96
144 0.36 0.59 95.55
145 0.34 0.57 96.12
146 0.31 0.50 96.62
147 0.29 0.47 97.09
148 0.25 0.41 97.50
149 0.20 0.33 97.83
150 0.16 0.26 98.09
151 0.15 0.24 98.33
152 0.13 0.22 98.54
153 0.12 0.19 98.74
154 0.11 0.18 98.92
155 0.10 0.16 99.08
156 0.09 0.14 99.22
157 0.07 0.12 99.33
158 0.06 0.11 99.44
159 0.05 0.09 99.53
160 0.05 0.08 99.61
161 0.04 0.06 99.67
162 0.03 0.05 99.72
163 0.03 0.05 99.77
164 0.03 0.04 99.81
165 0.03 0.04 99.85
166 0.02 0.03 99.88
167 0.01 0.02 99.90
168 0.01 0.02 99.92
169 0.01 0.02 99.94
170 0.01 0.02 99.95
171 0.01 0.01 99.97
172 0.01 0.01 99.98
173 0.00 0.01 99.98
174 0.00 0.01 99.99
175 0.00 0.00 99.99
176 0.00 0.00 100.00
177 0.00 0.00 100.00
178 0.00 0.00 100.00
179 0.00 0.00 100.00
180 0.00 0.00 100.00
181 0.00 0.00 100.00
182 0.00 0.00 100.00
183 0.00 0.00 100.00

Wykres osypiska

Według wykresu osypiska, 5 głównych składowych byłoby optymalną liczbą, składowe te wyjaśniają 72% wariancji.
Skladowa 1 Skladowa 2 Skladowa 3 Skladowa 4 Skladowa 5
Attr1 0.89 0.39 -0.04 0.01 -0.03
Attr2 -0.58 0.68 -0.17 0.12 -0.02
Attr3 0.62 -0.46 0.14 0.48 -0.04
Attr4 0.58 -0.60 0.16 0.23 -0.04
Attr5 0.00 -0.07 -0.06 -0.03 0.11
Attr6 0.49 -0.01 0.18 -0.24 -0.14
Attr7 0.89 0.40 -0.05 0.02 -0.04
Attr8 0.49 -0.70 0.14 -0.14 0.08
Attr9 0.12 0.27 -0.65 0.34 0.05
Attr10 0.59 -0.65 0.17 -0.10 0.03
Attr11 0.86 0.41 -0.07 0.05 -0.02
Attr12 0.91 0.11 0.06 -0.12 -0.03
Attr13 0.77 0.17 0.36 -0.24 0.06
Attr14 0.89 0.40 -0.05 0.02 -0.04
Attr15 -0.11 0.14 -0.01 0.02 -0.05
Attr16 0.90 -0.04 0.05 -0.14 0.01
Attr17 0.48 -0.70 0.14 -0.15 0.08
Attr18 0.89 0.40 -0.05 0.02 -0.04
Attr19 0.86 0.32 0.26 -0.11 0.03
Attr20 -0.14 -0.12 0.60 0.32 -0.62
Attr22 0.84 0.42 -0.06 0.01 -0.07
Attr23 0.85 0.32 0.25 -0.12 0.03
Attr24 0.83 0.23 0.02 -0.02 -0.08
Attr25 0.57 -0.49 0.26 -0.11 -0.03
Attr26 0.89 -0.06 0.06 -0.15 0.02
Attr28 0.40 -0.14 0.02 0.80 0.05
Attr29 -0.08 -0.14 0.47 -0.36 0.00
Attr30 -0.57 0.34 0.47 -0.14 0.03
Attr31 0.80 0.31 0.23 -0.08 0.05
Attr32 -0.50 0.47 0.62 0.06 0.15
Attr33 0.48 -0.50 -0.58 -0.08 -0.12
Attr34 0.30 -0.04 -0.49 0.30 0.11
Attr35 0.79 0.42 -0.05 0.00 -0.08
Attr36 0.10 0.29 -0.81 0.24 -0.08
Attr38 0.51 -0.62 0.18 -0.27 -0.02
Attr39 0.76 0.32 0.26 -0.11 -0.03
Attr40 0.48 -0.38 0.05 -0.01 0.19
Attr41 0.04 0.05 -0.02 -0.02 -0.02
Attr42 0.82 0.33 0.29 -0.12 0.00
Attr43 -0.16 -0.07 0.81 0.44 -0.02
Attr44 -0.09 0.03 0.54 0.30 0.63
Attr45 0.55 0.21 -0.07 -0.20 0.40
Attr46 0.57 -0.53 0.10 0.14 0.34
Attr47 -0.06 -0.09 0.62 0.32 -0.62
Attr48 0.80 0.44 -0.05 0.09 -0.06
Attr49 0.76 0.42 0.13 0.03 -0.04
Attr50 0.59 -0.54 0.12 0.36 0.06
Attr51 -0.48 0.66 -0.18 0.29 0.04
Attr52 -0.50 0.46 0.63 0.06 0.15
Attr53 0.44 -0.18 0.04 0.78 0.09
Attr54 0.42 -0.15 0.03 0.79 0.06
Attr55 0.25 -0.30 0.17 0.02 0.02
Attr56 0.59 0.24 0.22 -0.05 0.03
Attr57 0.50 0.52 -0.16 0.07 -0.08
Attr58 -0.62 -0.24 -0.24 0.06 -0.03
Attr59 -0.20 0.15 -0.02 -0.15 -0.04
Attr60 0.06 0.06 -0.50 -0.21 0.55
Attr61 0.11 0.00 -0.44 -0.25 -0.55
Attr62 -0.60 0.42 0.57 0.07 0.14
Attr63 0.58 -0.46 -0.52 -0.08 -0.12
Attr64 0.09 0.24 -0.31 0.76 0.04

Analizę zaprezentowano dla danych z pierwszego pliku, lecz dla pozostałych plików wyglądają bardzo podobnie.

Zbiór danych nr 2

Poszukiwałem drugiego zbioru danych odnośnie polskich przedsiębiorstw - bankrutów i niebankrutów, lecz nie znalazłem nic sensownego. Drugi zbiór danych jest dla przedsiębiorstw ze Słowacji, dane są podzielone na 4 lata i sektory gospodarki takie jak rolnictwo, budownictwo, przemysł, handel.

Dane pobrane zostały ze strony https://data.mendeley.com/datasets/j89csb932y/2

To 63 wskaźniki finansowe, informacje o bankructwie dla ponad 10 tysięcy przedsiębiorswt w każdym roku.

Dane dla pierwszego roku

W zbiorze danych występują braki dość licznie dla poszczególnych zmiennych.

##        V1                   V2                   V3           
##  Min.   :-3004.8300   Min.   :-1975000.0   Min.   :-81743.75  
##  1st Qu.:   -0.8925   1st Qu.:       0.0   1st Qu.:    -0.28  
##  Median :    2.0650   Median :      10.8   Median :     1.90  
##  Mean   :    0.7218   Mean   :    -818.4   Mean   :   -38.41  
##  3rd Qu.:   10.3900   3rd Qu.:      43.1   3rd Qu.:     7.27  
##  Max.   : 2346.5300   Max.   :    6124.1   Max.   :  1748.54  
##  NA's   :5            NA's   :14           NA's   :145        
##        V4                  V5                 V6                  V7          
##  Min.   : -852.430   Min.   : -851.43   Min.   : -851.430   Min.   : -8533.7  
##  1st Qu.:    0.050   1st Qu.:    0.50   1st Qu.:    0.850   1st Qu.:   142.8  
##  Median :    0.250   Median :    0.95   Median :    1.295   Median :   241.7  
##  Mean   :   13.472   Mean   :   14.89   Mean   :   16.731   Mean   :  1864.8  
##  3rd Qu.:    0.868   3rd Qu.:    1.92   3rd Qu.:    2.610   3rd Qu.:   463.1  
##  Max.   :25499.000   Max.   :25499.00   Max.   :25499.000   Max.   :889866.2  
##  NA's   :79          NA's   :79         NA's   :79          NA's   :142       
##        V8                  V9                 V10                V11         
##  Min.   :  -932.39   Min.   : -7378.26   Min.   :   -34.4   Min.   :-0.6900  
##  1st Qu.:    24.29   1st Qu.:    47.37   1st Qu.:     5.8   1st Qu.: 0.3400  
##  Median :    56.61   Median :    91.73   Median :    29.5   Median : 0.6400  
##  Mean   :   338.68   Mean   :   516.60   Mean   :   549.4   Mean   : 0.7967  
##  3rd Qu.:   108.75   3rd Qu.:   179.40   3rd Qu.:    78.3   3rd Qu.: 0.8800  
##  Max.   :193468.12   Max.   :141102.85   Max.   :882843.8   Max.   :83.1600  
##  NA's   :221         NA's   :150         NA's   :435        NA's   :30       
##       V12                 V13                 V14                 V15         
##  Min.   : -1506.55   Min.   : -1505.55   Min.   :-2675.040   Min.   : -69.32  
##  1st Qu.:     0.25   1st Qu.:     1.28   1st Qu.:    6.795   1st Qu.:  35.60  
##  Median :     1.15   Median :     2.16   Median :   20.405   Median :  64.88  
##  Mean   :   112.99   Mean   :   112.67   Mean   :   27.925   Mean   :  68.86  
##  3rd Qu.:     4.19   3rd Qu.:     5.20   3rd Qu.:   41.947   3rd Qu.:  88.34  
##  Max.   :271726.00   Max.   :271727.00   Max.   : 2346.530   Max.   :2738.74  
##  NA's   :39          NA's   :14          NA's   :5           NA's   :7        
##       V16                V17                 V18               V19         
##  Min.   :-5977.50   Min.   :-5290.460   Min.   :-274698   Min.   :-13.640  
##  1st Qu.:    3.44   1st Qu.:    0.608   1st Qu.:   7906   1st Qu.:  0.000  
##  Median :   16.10   Median :    1.150   Median :  14618   Median :  0.000  
##  Mean   :   83.18   Mean   :    0.628   Mean   :  21669   Mean   :  5.641  
##  3rd Qu.:   46.47   3rd Qu.:    2.373   3rd Qu.:  24869   3rd Qu.:  4.790  
##  Max.   :43100.00   Max.   :  566.640   Max.   : 713965   Max.   :183.530  
##  NA's   :70         NA's   :473         NA's   :1780      NA's   :42       
##       V20                 V21                 V22           
##  Min.   :  -39.690   Min.   : -9839.66   Min.   :-7946.350  
##  1st Qu.:    4.285   1st Qu.:     8.22   1st Qu.:   -0.060  
##  Median :   12.280   Median :    51.75   Median :    2.335  
##  Mean   :   28.767   Mean   :   135.25   Mean   :   -0.161  
##  3rd Qu.:   24.650   3rd Qu.:    81.40   3rd Qu.:   10.530  
##  Max.   :21118.750   Max.   :178060.62   Max.   : 2550.000  
##  NA's   :142         NA's   :57          NA's   :7          
##       V23                 V24                 V25                V26          
##  Min.   :-924266.7   Min.   :-38705.37   Min.   :-844.000   Min.   :-843.000  
##  1st Qu.:      0.3   1st Qu.:     0.01   1st Qu.:   0.050   1st Qu.:   0.480  
##  Median :     10.8   Median :     1.77   Median :   0.210   Median :   0.940  
##  Mean   :   -377.7   Mean   :   -20.45   Mean   :   3.687   Mean   :   6.840  
##  3rd Qu.:     39.3   3rd Qu.:     6.87   3rd Qu.:   0.850   3rd Qu.:   1.925  
##  Max.   :  61974.8   Max.   :  1261.67   Max.   :2949.670   Max.   :4591.570  
##  NA's   :14          NA's   :97          NA's   :63         NA's   :58        
##       V27                V28                 V29                V30           
##  Min.   :-843.000   Min.   :    -50.1   Min.   :   -78.3   Min.   :  -3391.2  
##  1st Qu.:   0.870   1st Qu.:    140.4   1st Qu.:    25.4   1st Qu.:     43.6  
##  Median :   1.300   Median :    231.8   Median :    53.4   Median :     85.4  
##  Mean   :   7.529   Mean   :   3196.8   Mean   :   560.7   Mean   :   1429.9  
##  3rd Qu.:   2.572   3rd Qu.:    445.3   3rd Qu.:    98.3   3rd Qu.:    168.5  
##  Max.   :4591.570   Max.   :2561227.8   Max.   :681721.9   Max.   :1482960.6  
##  NA's   :61         NA's   :97          NA's   :170        NA's   :105        
##       V31                V32                V33                V34          
##  Min.   :   -6.62   Min.   : -6.2900   Min.   :-6841.34   Min.   :-7269.17  
##  1st Qu.:    6.24   1st Qu.:  0.3500   1st Qu.:    0.28   1st Qu.:    1.30  
##  Median :   30.93   Median :  0.6400   Median :    1.19   Median :    2.21  
##  Mean   :  155.69   Mean   :  0.9428   Mean   :   39.08   Mean   :   38.17  
##  3rd Qu.:   80.99   3rd Qu.:  0.8800   3rd Qu.:    4.13   3rd Qu.:    5.12  
##  Max.   :77371.00   Max.   :134.2800   Max.   :88584.00   Max.   :88585.00  
##  NA's   :404        NA's   :26         NA's   :33         NA's   :14        
##       V35                 V36                V37                 V38          
##  Min.   :-6709.380   Min.   : -628.93   Min.   : -9580.00   Min.   :-1457.45  
##  1st Qu.:    7.655   1st Qu.:   35.97   1st Qu.:     4.04   1st Qu.:    0.63  
##  Median :   21.670   Median :   66.09   Median :    16.34   Median :    1.18  
##  Mean   :   29.709   Mean   :   76.40   Mean   :   111.10   Mean   :   21.08  
##  3rd Qu.:   44.670   3rd Qu.:   88.60   3rd Qu.:    44.67   3rd Qu.:    2.49  
##  Max.   : 2550.000   Max.   :13427.61   Max.   :189291.67   Max.   :20078.50  
##  NA's   :7           NA's   :11         NA's   :48          NA's   :436       
##       V39              V40                V41               V42          
##  Min.   :-73059   Min.   : -22.380   Min.   : -10.31   Min.   :-5710.18  
##  1st Qu.:  8352   1st Qu.:   0.000   1st Qu.:   4.69   1st Qu.:   18.78  
##  Median : 15767   Median :   0.000   Median :  12.23   Median :   55.69  
##  Mean   : 21323   Mean   :   6.697   Mean   :  22.48   Mean   :   71.28  
##  3rd Qu.: 26070   3rd Qu.:   6.630   3rd Qu.:  24.32   3rd Qu.:   81.63  
##  Max.   :249036   Max.   :1483.850   Max.   :6574.96   Max.   :16890.70  
##  NA's   :1985     NA's   :16         NA's   :96        NA's   :47        
##       V43                  V44                 V45            
##  Min.   :-146527.27   Min.   :-83175.00   Min.   :-15939.680  
##  1st Qu.:     -0.59   1st Qu.:     0.12   1st Qu.:    -0.330  
##  Median :      1.98   Median :     8.70   Median :     1.450  
##  Mean   :    -63.12   Mean   :   -45.87   Mean   :    -3.726  
##  3rd Qu.:      8.66   3rd Qu.:    34.53   3rd Qu.:     6.605  
##  Max.   :    576.15   Max.   : 14222.09   Max.   :  6810.280  
##  NA's   :17           NA's   :14          NA's   :86          
##       V46                V47                 V48                 V49           
##  Min.   :-8911.00   Min.   :-8911.000   Min.   :-8911.000   Min.   :  -1484.0  
##  1st Qu.:    0.05   1st Qu.:    0.480   1st Qu.:    0.850   1st Qu.:    145.7  
##  Median :    0.22   Median :    0.940   Median :    1.290   Median :    245.9  
##  Mean   :    1.92   Mean   :    4.816   Mean   :    5.491   Mean   :   2797.2  
##  3rd Qu.:    0.87   3rd Qu.:    1.980   3rd Qu.:    2.650   3rd Qu.:    478.9  
##  Max.   : 5354.33   Max.   : 5354.330   Max.   : 5393.670   Max.   :3045765.5  
##  NA's   :67         NA's   :57          NA's   :61          NA's   :88         
##       V50                V51                 V52                 V53          
##  Min.   : -1444.2   Min.   : -19790.6   Min.   :   -26.04   Min.   :  -1.840  
##  1st Qu.:    26.3   1st Qu.:     43.3   1st Qu.:    11.38   1st Qu.:   0.340  
##  Median :    54.8   Median :     85.2   Median :    34.22   Median :   0.640  
##  Mean   :   689.2   Mean   :   1216.0   Mean   :   184.12   Mean   :   1.376  
##  3rd Qu.:   103.6   3rd Qu.:    177.6   3rd Qu.:    87.30   3rd Qu.:   0.880  
##  Max.   :680895.6   Max.   :1046331.9   Max.   :148259.72   Max.   :1151.040  
##  NA's   :174        NA's   :101         NA's   :521         NA's   :44        
##       V54                V55                V56                  V57           
##  Min.   :-2194.34   Min.   :-2193.34   Min.   :-146527.27   Min.   :  -184.34  
##  1st Qu.:    0.27   1st Qu.:    1.27   1st Qu.:      6.88   1st Qu.:    35.08  
##  Median :    1.24   Median :    2.25   Median :     20.68   Median :    65.15  
##  Mean   :   36.97   Mean   :   38.34   Mean   :    -19.25   Mean   :   125.01  
##  3rd Qu.:    4.04   3rd Qu.:    4.98   3rd Qu.:     43.12   3rd Qu.:    88.64  
##  Max.   :77121.00   Max.   :77122.00   Max.   :  26220.79   Max.   :115103.90  
##  NA's   :41         NA's   :14         NA's   :17           NA's   :20         
##       V58                 V59                V60               V61         
##  Min.   :-2472.730   Min.   :-218.040   Min.   :-522440   Min.   :-11.870  
##  1st Qu.:    3.175   1st Qu.:   0.630   1st Qu.:   7910   1st Qu.:  0.000  
##  Median :   14.380   Median :   1.190   Median :  14747   Median :  0.000  
##  Mean   :   56.867   Mean   :   4.796   Mean   :  18348   Mean   :  6.388  
##  3rd Qu.:   40.288   3rd Qu.:   2.590   3rd Qu.:  24163   3rd Qu.:  6.520  
##  Max.   :11295.900   Max.   :1288.490   Max.   : 747655   Max.   :257.630  
##  NA's   :47          NA's   :417        NA's   :801       NA's   :20       
##       V62               V63                class        
##  Min.   : -20.96   Min.   :-23692.80   Min.   :0.00000  
##  1st Qu.:   4.93   1st Qu.:    21.68   1st Qu.:0.00000  
##  Median :  12.62   Median :    58.16   Median :0.00000  
##  Mean   :  22.69   Mean   :    39.62   Mean   :0.02764  
##  3rd Qu.:  24.17   3rd Qu.:    83.14   3rd Qu.:0.00000  
##  Max.   :7173.30   Max.   : 20746.67   Max.   :1.00000  
##  NA's   :85        NA's   :55

Postąpiono tak samo jak w przypadku pierwszego zbioru danych. Usunięto zmienne i obserwacje z bardzo dużą ilością braków.

Teraz braków w danych jest mało, około 0,1%. Braki można jeszcze zastąpić średnią grupową.

Stosunek ilościowy

Skalowanie danych

Dane poddano skalowaniu. Usunięto outliery poprzednio stosowaną funkcją. Braki danych zastąpiono średnią grupową. Zmienną class ustawiono jako factor.

współczynniki zmienności

Praktycznie każda zmienna cechuje się dużą zmiennością, wyjątkiem jest V7, V8

##       V1       V2       V3       V4       V5       V6       V7       V8 
##    4.601    0.073    2.024   -3.426   -1.829   -1.976   -0.309   -0.272 
##       V9      V11      V12      V13      V14      V15      V16      V19 
##   -0.346   -1.933   -0.054   -0.054 -111.878   -5.548   -1.881  -27.986 
##      V20      V21      V22      V23      V24      V25      V26      V27 
##   -2.096   -0.843    5.501    0.187    0.289   -1.502   -0.329   -0.398 
##      V28      V29      V30      V32      V33      V34      V35      V36 
##   -0.090   -0.099   -0.163   -1.781   -0.160   -0.169  -50.329   -5.032 
##      V37      V40      V41      V42      V43      V44      V45      V46 
##   -3.152  -20.352   -1.355   -5.266    3.505    0.868    0.939   -1.084 
##      V47      V48      V49      V50      V51      V53      V54      V55 
##   -0.355   -0.431   -0.080   -0.077   -0.215   -1.716   -0.146   -0.145 
##      V56      V57      V58      V61      V62      V63    class 
##   52.841   -3.805   -3.059  -30.646   -1.352 -258.875   10.510

Macierz korelacji

W macierzy korelacji nie widać silnej korelacji między żadną ze zmiennych, a bankructwem. Zmienne posegregowane w kolejności malejącej korelacji.

Korelacja zmienne

Przedsiębiorstw bankrutów jest mniej niż w przypadku zbioru danych dla Polski.

Z uwagi na tak małą ilość bankrutów w stosunku do “zdrowych” przedsiębiorstw zastanawiam się, czy usuwać obserwacje gdzie brakuje danych, czy w większej ilości zastąpić je średnią.

Zbiór danych nr 3

Dane te pobrano z serwisu https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

Dotyczą przedsiębiorstw z lat 1999-2009 notowanych na tajwańskiej giełdzie. To ponad 95 wskaźników dla 6819 obserwacji.

Braki danych nie występują.

##  ROA(C) before interest and depreciation before interest
##  Min.   :0.2560                                         
##  1st Qu.:0.4761                                         
##  Median :0.5010                                         
##  Mean   :0.5038                                         
##  3rd Qu.:0.5340                                         
##  Max.   :0.7753                                         
##  ROA(A) before interest and % after tax
##  Min.   :0.2648                        
##  1st Qu.:0.5360                        
##  Median :0.5596                        
##  Mean   :0.5586                        
##  3rd Qu.:0.5870                        
##  Max.   :0.9847                        
##  ROA(B) before interest and depreciation after tax Operating Gross Margin
##  Min.   :0.2821                                    Min.   :0.5127        
##  1st Qu.:0.5266                                    1st Qu.:0.6000        
##  Median :0.5502                                    Median :0.6064        
##  Mean   :0.5522                                    Mean   :0.6080        
##  3rd Qu.:0.5824                                    3rd Qu.:0.6136        
##  Max.   :0.8103                                    Max.   :0.6652        
##  Realized Sales Gross Margin Operating Profit Rate Pre-tax net Interest Rate
##  Min.   :0.5127              Min.   :0.9873        Min.   :0.7651           
##  1st Qu.:0.6001              1st Qu.:0.9990        1st Qu.:0.7974           
##  Median :0.6064              Median :0.9990        Median :0.7975           
##  Mean   :0.6079              Mean   :0.9990        Mean   :0.7974           
##  3rd Qu.:0.6135              3rd Qu.:0.9991        3rd Qu.:0.7976           
##  Max.   :0.6652              Max.   :0.9996        Max.   :0.8034           
##  After-tax net Interest Rate Non-industry income and expenditure/revenue
##  Min.   :0.7789              Min.   :0.2715                             
##  1st Qu.:0.8093              1st Qu.:0.3035                             
##  Median :0.8094              Median :0.3035                             
##  Mean   :0.8093              Mean   :0.3035                             
##  3rd Qu.:0.8095              3rd Qu.:0.3036                             
##  Max.   :0.8145              Max.   :0.3130                             
##  Continuous interest rate (after tax) Operating Expense Rate
##  Min.   :0.7488                       Min.   :0.000e+00     
##  1st Qu.:0.7816                       1st Qu.:0.000e+00     
##  Median :0.7816                       Median :0.000e+00     
##  Mean   :0.7815                       Mean   :1.896e+09     
##  3rd Qu.:0.7817                       3rd Qu.:3.550e+09     
##  Max.   :0.7834                       Max.   :9.980e+09     
##  Research and development expense rate Cash flow rate  
##  Min.   :0.00e+00                      Min.   :0.3466  
##  1st Qu.:0.00e+00                      1st Qu.:0.4620  
##  Median :4.41e+08                      Median :0.4652  
##  Mean   :1.83e+09                      Mean   :0.4680  
##  3rd Qu.:3.05e+09                      3rd Qu.:0.4712  
##  Max.   :9.86e+09                      Max.   :0.6746  
##  Interest-bearing debt interest rate  Tax rate (A)     Net Value Per Share (B)
##  Min.   :0.00e+00                    Min.   :0.00000   Min.   :0.1284         
##  1st Qu.:0.00e+00                    1st Qu.:0.00000   1st Qu.:0.1739         
##  Median :0.00e+00                    Median :0.08761   Median :0.1844         
##  Mean   :1.54e+07                    Mean   :0.12487   Mean   :0.1912         
##  3rd Qu.:0.00e+00                    3rd Qu.:0.22191   3rd Qu.:0.1995         
##  Max.   :9.90e+08                    Max.   :0.99190   Max.   :0.4759         
##  Net Value Per Share (A) Net Value Per Share (C)
##  Min.   :0.1284          Min.   :0.1284         
##  1st Qu.:0.1739          1st Qu.:0.1739         
##  Median :0.1844          Median :0.1844         
##  Mean   :0.1912          Mean   :0.1912         
##  3rd Qu.:0.1997          3rd Qu.:0.1997         
##  Max.   :0.4759          Max.   :0.4759         
##  Persistent EPS in the Last Four Seasons Cash Flow Per Share
##  Min.   :0.1096                          Min.   :0.2719     
##  1st Qu.:0.2145                          1st Qu.:0.3184     
##  Median :0.2235                          Median :0.3228     
##  Mean   :0.2277                          Mean   :0.3242     
##  3rd Qu.:0.2384                          3rd Qu.:0.3291     
##  Max.   :0.4855                          Max.   :0.4413     
##  Revenue Per Share (Yuan ¥) Operating Profit Per Share (Yuan ¥)
##  Min.   :0.0001966          Min.   :0.03835                    
##  1st Qu.:0.0157297          1st Qu.:0.09616                    
##  Median :0.0286311          Median :0.10419                    
##  Mean   :0.0385714          Mean   :0.10880                    
##  3rd Qu.:0.0455821          3rd Qu.:0.11563                    
##  Max.   :0.5777183          Max.   :0.34468                    
##  Per Share Net profit before tax (Yuan ¥)
##  Min.   :0.08872                         
##  1st Qu.:0.17078                         
##  Median :0.17914                         
##  Mean   :0.18384                         
##  3rd Qu.:0.19295                         
##  Max.   :0.50908                         
##  Realized Sales Gross Profit Growth Rate Operating Profit Growth Rate
##  Min.   :0.009889                        Min.   :0.8175              
##  1st Qu.:0.022063                        1st Qu.:0.8480              
##  Median :0.022098                        Median :0.8480              
##  Mean   :0.022221                        Mean   :0.8482              
##  3rd Qu.:0.022144                        3rd Qu.:0.8481              
##  Max.   :0.081282                        Max.   :0.9322              
##  After-tax Net Profit Growth Rate Regular Net Profit Growth Rate
##  Min.   :0.6209                   Min.   :0.6198                
##  1st Qu.:0.6893                   1st Qu.:0.6893                
##  Median :0.6894                   Median :0.6894                
##  Mean   :0.6898                   Mean   :0.6898                
##  3rd Qu.:0.6897                   3rd Qu.:0.6896                
##  Max.   :0.8782                   Max.   :0.8782                
##  Continuous Net Profit Growth Rate Total Asset Growth Rate
##  Min.   :0.1820                    Min.   :0.000e+00      
##  1st Qu.:0.2176                    1st Qu.:4.788e+09      
##  Median :0.2176                    Median :6.360e+09      
##  Mean   :0.2176                    Mean   :5.493e+09      
##  3rd Qu.:0.2176                    3rd Qu.:7.320e+09      
##  Max.   :0.2393                    Max.   :9.960e+09      
##  Net Value Growth Rate Total Asset Return Growth Rate Ratio Cash Reinvestment %
##  Min.   :0.0002136     Min.   :0.2590                       Min.   :0.3061     
##  1st Qu.:0.0004420     1st Qu.:0.2638                       1st Qu.:0.3755     
##  Median :0.0004623     Median :0.2640                       Median :0.3810     
##  Mean   :0.0005620     Mean   :0.2643                       Mean   :0.3809     
##  3rd Qu.:0.0004949     3rd Qu.:0.2644                       3rd Qu.:0.3868     
##  Max.   :0.0276540     Max.   :0.3586                       Max.   :0.4751     
##  Current Ratio        Quick Ratio        Interest Expense Ratio
##  Min.   :0.0008761   Min.   :0.000e+00   Min.   :0.5874        
##  1st Qu.:0.0079422   1st Qu.:0.000e+00   1st Qu.:0.6306        
##  Median :0.0111750   Median :0.000e+00   Median :0.6307        
##  Mean   :0.0161195   Mean   :7.683e+06   Mean   :0.6309        
##  3rd Qu.:0.0172112   3rd Qu.:0.000e+00   3rd Qu.:0.6311        
##  Max.   :0.2668079   Max.   :5.240e+09   Max.   :0.6774        
##  Total debt/Total net worth  Debt ratio %       Net worth/Assets
##  Min.   :0.000e+00          Min.   :0.0005744   Min.   :0.7113  
##  1st Qu.:0.000e+00          1st Qu.:0.0693382   1st Qu.:0.8515  
##  Median :0.000e+00          Median :0.1089579   Median :0.8910  
##  Mean   :2.669e+06          Mean   :0.1113335   Mean   :0.8887  
##  3rd Qu.:0.000e+00          3rd Qu.:0.1485473   3rd Qu.:0.9307  
##  Max.   :1.820e+09          Max.   :0.2886598   Max.   :0.9994  
##  Long-term fund suitability ratio (A) Borrowing dependency
##  Min.   :0.004853                     Min.   :0.3696      
##  1st Qu.:0.005258                     1st Qu.:0.3701      
##  Median :0.005621                     Median :0.3724      
##  Mean   :0.008926                     Mean   :0.3751      
##  3rd Qu.:0.006825                     3rd Qu.:0.3762      
##  Max.   :0.984855                     Max.   :0.6690      
##  Contingent liabilities/Net worth Operating profit/Paid-in capital
##  Min.   :0.005366                 Min.   :0.03861                 
##  1st Qu.:0.005366                 1st Qu.:0.09617                 
##  Median :0.005366                 Median :0.10417                 
##  Mean   :0.005855                 Mean   :0.10873                 
##  3rd Qu.:0.005820                 3rd Qu.:0.11529                 
##  Max.   :0.049600                 Max.   :0.34468                 
##  Net profit before tax/Paid-in capital
##  Min.   :0.08762                      
##  1st Qu.:0.16979                      
##  Median :0.17800                      
##  Mean   :0.18243                      
##  3rd Qu.:0.19126                      
##  Max.   :0.50852                      
##  Inventory and accounts receivable/Net value Total Asset Turnover
##  Min.   :0.3937                              Min.   :0.002998    
##  1st Qu.:0.3975                              1st Qu.:0.074963    
##  Median :0.4002                              Median :0.121439    
##  Mean   :0.4027                              Mean   :0.139903    
##  3rd Qu.:0.4045                              3rd Qu.:0.179535    
##  Max.   :0.5172                              Max.   :0.676162    
##  Accounts Receivable Turnover Average Collection Days
##  Min.   :0.000e+00            Min.   :0.000e+00      
##  1st Qu.:0.000e+00            1st Qu.:0.000e+00      
##  Median :0.000e+00            Median :0.000e+00      
##  Mean   :2.082e+06            Mean   :1.337e+07      
##  3rd Qu.:0.000e+00            3rd Qu.:0.000e+00      
##  Max.   :1.420e+09            Max.   :8.370e+09      
##  Inventory Turnover Rate (times) Fixed Assets Turnover Frequency
##  Min.   :0.000e+00               Min.   :0.000e+00              
##  1st Qu.:0.000e+00               1st Qu.:0.000e+00              
##  Median :0.000e+00               Median :0.000e+00              
##  Mean   :2.020e+09               Mean   :9.392e+08              
##  3rd Qu.:4.125e+09               3rd Qu.:0.000e+00              
##  Max.   :9.940e+09               Max.   :9.680e+09              
##  Net Worth Turnover Rate (times) Revenue per person Operating profit per person
##  Min.   :0.009194                Min.   :0.000141   Min.   :0.3073             
##  1st Qu.:0.021613                1st Qu.:0.011259   1st Qu.:0.3925             
##  Median :0.030000                Median :0.019144   Median :0.3955             
##  Mean   :0.037704                Mean   :0.034580   Mean   :0.4002             
##  3rd Qu.:0.043226                3rd Qu.:0.033293   3rd Qu.:0.4020             
##  Max.   :0.327581                Max.   :0.473989   Max.   :0.8971             
##  Allocation rate per person Working Capital to Total Assets
##  Min.   :0.000e+00          Min.   :0.6524                 
##  1st Qu.:0.000e+00          1st Qu.:0.7789                 
##  Median :0.000e+00          Median :0.8137                 
##  Mean   :2.563e+07          Mean   :0.8172                 
##  3rd Qu.:0.000e+00          3rd Qu.:0.8551                 
##  Max.   :9.570e+09          Max.   :1.0000                 
##  Quick Assets/Total Assets Current Assets/Total Assets Cash/Total Assets 
##  Min.   :0.01245           Min.   :0.02693             Min.   :0.000433  
##  1st Qu.:0.25308           1st Qu.:0.36765             1st Qu.:0.034489  
##  Median :0.38068           Median :0.51120             Median :0.079526  
##  Mean   :0.40481           Mean   :0.52628             Mean   :0.127139  
##  3rd Qu.:0.53725           3rd Qu.:0.68536             3rd Qu.:0.160944  
##  Max.   :0.98894           Max.   :0.99545             Max.   :0.925018  
##  Quick Assets/Current Liability Cash/Current Liability
##  Min.   :0.000e+00              Min.   :0.000e+00     
##  1st Qu.:0.000e+00              1st Qu.:0.000e+00     
##  Median :0.000e+00              Median :0.000e+00     
##  Mean   :1.194e+07              Mean   :4.826e+07     
##  3rd Qu.:0.000e+00              3rd Qu.:0.000e+00     
##  Max.   :8.140e+09              Max.   :8.590e+09     
##  Current Liability to Assets Operating Funds to Liability
##  Min.   :0.003873            Min.   :0.0000              
##  1st Qu.:0.052810            1st Qu.:0.3419              
##  Median :0.081007            Median :0.3493              
##  Mean   :0.088429            Mean   :0.3554              
##  3rd Qu.:0.113723            3rd Qu.:0.3618              
##  Max.   :0.258370            Max.   :0.9564              
##  Inventory/Working Capital Inventory/Current Liability
##  Min.   :0.2541            Min.   :0.000e+00          
##  1st Qu.:0.2770            1st Qu.:0.000e+00          
##  Median :0.2772            Median :0.000e+00          
##  Mean   :0.2774            Mean   :2.017e+07          
##  3rd Qu.:0.2774            3rd Qu.:0.000e+00          
##  Max.   :0.3424            Max.   :6.370e+09          
##  Current Liabilities/Liability Working Capital/Equity
##  Min.   :0.1104                Min.   :0.6126        
##  1st Qu.:0.6260                1st Qu.:0.7339        
##  Median :0.8017                Median :0.7362        
##  Mean   :0.7602                Mean   :0.7357        
##  3rd Qu.:0.9376                3rd Qu.:0.7385        
##  Max.   :1.0000                Max.   :0.7476        
##  Current Liabilities/Equity Long-term Liability to Current Assets
##  Min.   :0.3263             Min.   :0.000e+00                    
##  1st Qu.:0.3281             1st Qu.:0.000e+00                    
##  Median :0.3296             Median :0.000e+00                    
##  Mean   :0.3317             Mean   :7.241e+07                    
##  3rd Qu.:0.3321             3rd Qu.:0.000e+00                    
##  Max.   :0.5261             Max.   :9.310e+09                    
##  Retained Earnings to Total Assets Total income/Total expense
##  Min.   :0.6840                    Min.   :0.000772          
##  1st Qu.:0.9318                    1st Qu.:0.002237          
##  Median :0.9375                    Median :0.002331          
##  Mean   :0.9351                    Mean   :0.002397          
##  3rd Qu.:0.9447                    3rd Qu.:0.002489          
##  Max.   :0.9940                    Max.   :0.017451          
##  Total expense/Assets Current Asset Turnover Rate Quick Asset Turnover Rate
##  Min.   :0.00199      Min.   :0.000e+00           Min.   :0.000e+00        
##  1st Qu.:0.01436      1st Qu.:0.000e+00           1st Qu.:0.000e+00        
##  Median :0.02265      Median :0.000e+00           Median :0.000e+00        
##  Mean   :0.02903      Mean   :1.273e+09           Mean   :2.147e+09        
##  3rd Qu.:0.03566      3rd Qu.:0.000e+00           3rd Qu.:5.165e+09        
##  Max.   :0.17126      Max.   :9.960e+09           Max.   :9.970e+09        
##  Working capitcal Turnover Rate Cash Turnover Rate  Cash Flow to Sales
##  Min.   :0.5934                 Min.   :0.000e+00   Min.   :0.6675    
##  1st Qu.:0.5939                 1st Qu.:0.000e+00   1st Qu.:0.6716    
##  Median :0.5940                 Median :1.095e+09   Median :0.6716    
##  Mean   :0.5940                 Mean   :2.518e+09   Mean   :0.6716    
##  3rd Qu.:0.5940                 3rd Qu.:4.670e+09   3rd Qu.:0.6716    
##  Max.   :0.6054                 Max.   :1.000e+10   Max.   :0.6760    
##  Fixed Assets to Assets Current Liability to Liability
##  Min.   :0.0001865      Min.   :0.1104                
##  1st Qu.:0.0881918      1st Qu.:0.6260                
##  Median :0.2045842      Median :0.8017                
##  Mean   :0.2494363      Mean   :0.7602                
##  3rd Qu.:0.3733647      3rd Qu.:0.9376                
##  Max.   :0.8493070      Max.   :1.0000                
##  Current Liability to Equity Equity to Long-term Liability
##  Min.   :0.3263              Min.   :0.1109               
##  1st Qu.:0.3281              1st Qu.:0.1109               
##  Median :0.3296              Median :0.1123               
##  Mean   :0.3317              Mean   :0.1160               
##  3rd Qu.:0.3321              3rd Qu.:0.1169               
##  Max.   :0.5261              Max.   :0.2770               
##  Cash Flow to Total Assets Cash Flow to Liability CFO to Assets   
##  Min.   :0.4181            Min.   :0.0000         Min.   :0.2542  
##  1st Qu.:0.6316            1st Qu.:0.4569         1st Qu.:0.5697  
##  Median :0.6454            Median :0.4598         Median :0.5951  
##  Mean   :0.6506            Mean   :0.4620         Mean   :0.5965  
##  3rd Qu.:0.6665            3rd Qu.:0.4649         3rd Qu.:0.6255  
##  Max.   :1.0000            Max.   :0.9051         Max.   :0.8340  
##  Cash Flow to Equity Current Liability to Current Assets Liability-Assets Flag
##  Min.   :0.2662      Min.   :0.0008338                   Min.   :0            
##  1st Qu.:0.3127      1st Qu.:0.0170451                   1st Qu.:0            
##  Median :0.3150      Median :0.0261680                   Median :0            
##  Mean   :0.3156      Mean   :0.0295904                   Mean   :0            
##  3rd Qu.:0.3178      3rd Qu.:0.0365454                   3rd Qu.:0            
##  Max.   :0.3566      Max.   :0.2571601                   Max.   :0            
##  Net Income to Total Assets Total assets to GNP price No-credit Interval
##  Min.   :0.5745             Min.   :0.000e+00         Min.   :0.0000    
##  1st Qu.:0.7970             1st Qu.:0.000e+00         1st Qu.:0.6236    
##  Median :0.8104             Median :0.000e+00         Median :0.6239    
##  Mean   :0.8079             Mean   :2.199e+07         Mean   :0.6231    
##  3rd Qu.:0.8258             3rd Qu.:0.000e+00         3rd Qu.:0.6241    
##  Max.   :0.9829             Max.   :8.140e+09         Max.   :0.9564    
##  Gross Profit to Sales Net Income to Stockholder's Equity Liability to Equity
##  Min.   :0.5127        Min.   :0.6376                     Min.   :0.2748     
##  1st Qu.:0.6000        1st Qu.:0.8401                     1st Qu.:0.2768     
##  Median :0.6064        Median :0.8411                     Median :0.2786     
##  Mean   :0.6080        Mean   :0.8401                     Mean   :0.2807     
##  3rd Qu.:0.6136        3rd Qu.:0.8423                     3rd Qu.:0.2815     
##  Max.   :0.6651        Max.   :0.8496                     Max.   :0.4843     
##  Degree of Financial Leverage (DFL)
##  Min.   :0.004429                  
##  1st Qu.:0.026791                  
##  Median :0.026808                  
##  Mean   :0.026965                  
##  3rd Qu.:0.026897                  
##  Max.   :0.051601                  
##  Interest Coverage Ratio (Interest expense to EBIT) Net Income Flag
##  Min.   :0.1721                                     Min.   :1      
##  1st Qu.:0.5652                                     1st Qu.:1      
##  Median :0.5653                                     Median :1      
##  Mean   :0.5642                                     Mean   :1      
##  3rd Qu.:0.5657                                     3rd Qu.:1      
##  Max.   :0.6325                                     Max.   :1      
##  Equity to Liability     class        
##  Min.   :0.01069     Min.   :0.00000  
##  1st Qu.:0.02453     1st Qu.:0.00000  
##  Median :0.03463     Median :0.00000  
##  Mean   :0.05197     Mean   :0.02786  
##  3rd Qu.:0.05559     3rd Qu.:0.00000  
##  Max.   :0.88102     Max.   :1.00000

Skalowanie danych i usunięcie outlierów

Stosunek ilości bankrutów do niebankrutów jest podobny jak w danych dla Polski.

Korelacja

Zmienne posegregowane w kolejności malejącej korelacji. Wartości korelacji są znacznie wyższe niż w przypadku poprzednich danych, jednak wciąż jest to mniej niż 0.5.

Korelacja między zmiennymi

Widać jednak silną korelację między zmiennymi, właściwe mogłoby być zastosowanie głównych składowych w celu redukcji wymiarowości.